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Single-Server Design 


HTTP server 
Database 
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Single point of failure! 
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Separating out the database 
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Vertical scaling 
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Servers only 
come so large, 
and you still 
have single 


points of failure. 
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Horizontal scaling 


No 
This is easier if 
your web 
servers are 
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Choose the simplest architecture that meets your 
projected traffic requirements. 


But no simpler. 
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Side note: Where do 
servers come from? 


Provisioned within your own company's 
data centers 


Cloud services (i.e., Amazon EC2, Google 
Compute Engine, Azure VM's) 


* Fully managed "serverless" services (i.e., 
Lambda, Kinesis, Athena) 


SCALING THE DATABASE 
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Failover servers: Cold Standby 
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Failover servers: Warm Standby 
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Failover servers: Hot Standby 
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Horizontal Scaling of Databases: Sharding 
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Shard 1 


Client & request router 


Shard 2 Shard 3 


Shard 2 Shard 3 
backup backup 
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More Specific Example: MongoDB 


App Server 
Process 


Canfia 
Canfia 


Config 
Server 
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PRIMARY 


mongos 


PRIMARY 


PRIMARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


RS = “Replica Set” 
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SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


RS1: users 
Min -> 1000 


RS2: users 
1000 -> 5000 


RS3: users 
5000 -> max 
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More Specific Example: Cassandra 


cassandra 
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Sharded databases * Tough to do joins across shards. 
. * Resharding 

are sometimes Hotspots 

called "NoSQL" 


Most "NoSQL" databases actually do support most SQL 
operations and use SQL as their API. 


Still works best with simple key/value lookups. 
A formal schema may not be needed. 
Examples: MongoDB, DynamoDB, Cassandra, HBase 


Denormalizing 
NORMALIZED DATA: Less storage space, more lookups, updates in one place 


Customer ID | Time Customer ID Name [Phone | 
ID 


123 Frank Kane 555-1234 

123 6:30 451 John Smith 555-5233 
2 451 7:00 
3 123 8:00 


DENORMALIZED DATA: More storage place, one lookup, updates are hard 


123 Frank Kane 555-1234 6:30 
2 451 John Smith 555-5233 7:00 
3 123 Frank Kane 555-1234 8:00 
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Another approach is to just throw data into text files (csv, json 
perhaps) into a big distributed storage system like Amazon S3 


* [his is called a "data lake" 
* Common approach for "big data" and unstructured data 
* Another process (i.e., Amazon Glue) creates a schema for that data 
e And cloud-based features let you query the data. 
* Amazon Athena (serverless) 
C | ou d e Amazon Redshift (distributed data warehouse) 
e You still need to think about how to partition the raw data for best 


Sol utions / performance. 


Data Lakes 


Amazon Simple Storage AWS Glue 
Service (Amazon S3) 


Amazon Redshift L 


ACID 
Compliance 


Atomicity: Either the entire transaction 
succeeds, or the entire thing fails. 


Consistency: All database rules are 
enforced, or the entire transaction is 
rolled back. 


Isolation: No transaction is affected by 
any other transaction that is still in 
progress. 


Durability: Once a transaction is 


committed, it stays, even if the system 
crashes immediately after. 


The CAP Theorem 


Availability 


۳ 


MYSQL 


cassandra 


Consistency ————————————— Partition-Tolerance 
A P ACHE " 
HERSE mongo 


amazon (n strongly consistent 


DynamoDB mode) 
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MongoDB: Single Master; trades off availability 


App Server 
Process 


mongos 


Canfia 
Canfia 


Config 
Server 
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SECONDARY 


SECONDARY 


RS = “Replica Set” 
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SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


RS1: users 
Min -> 1000 


RS2: users 
1000 -> 5000 


RS3: users 
5000 -> max 
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Cassandra: No single master, eventually consistent 


cassandra 
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Be sure to understand requirements 
about scale, consistency, and availability 
before proposing a specific database 
solution 


ASK QUESTIONS 
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Caching Layers 


Education 


Caching Layers 


aoa 
Load Balancer 


TAE » 


Caching Layer 
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How Caches Work 


Horizontally scaled servers 

Clients hash requests to a given server = =| App / web servers 
In-memory (fast) 

Appropriate for applications with more reads = 
than writes 

The expiration policy dictates how long data is 
cached. Too long and your data may go stale; 
too short and the cache won't do much good. 
Hotspots can be a problem (the “celebrity 
problem") 

Cold-start is also a problem. How do you 
initially warm up the cache without bringing 
down whatever you are caching? 
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Cache servers 


Eviction Policies 


e LRU: Least Recently Used 
e LFU: Least Frequently Used 
e FIFO: First In First Out 
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۳6۵۱۷ 1 ۳6۷ 2 ۳۵۷ 3 ۳6۷ HashMap 


Sample LRU data architecture 
(for a given shard) 
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A Few Caching Technologies 


s |n- memory » Adds more a Made for .NET, a Java » Amazon Web 
key/value store features Java, Node.js « Just a distributed Services (AWS) 
a Open source. a Snapshots, Map really solution 
replication, a Fully-managed 
transactions, Redis or 
pub/sub Memcached 
» Advanced data 
structures 
a More complex in 
general 
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Content Delivery Networks (CDNs) 


* Geographically distributed لب‎ ğ LJ Ë LJ 5 


* Local hosting of 
* HTML 
* Javascript 
* |mages US EU IN... 
e Some limited computation may be 
available as well 
e Mainly useful for static content, such as 
images or static web pages. 
* You probably won't be asked to design 
a static web page, though! 


Caching Layer 
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CDN Providers 


Google Microsoft Ej 
Cloud CDN lg Azure CDN 


We've already talked about backup hosts 


PRIMARY 


PRIMARY 


PRIMARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 


SECONDARY 
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...Dut what about a real disaster? 
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Things that can fail 


* Asingle server 

* Anentire rack 

e An entire data center (AKA "availability zone") 
e An entire region 

e anything more, and you have bigger problems... 
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Be smart about 
distributing your servers 


e Secondaries should be spread across 
multiple racks, availability zones, and 
regions 

* Make sure your system has enough 
capacity to survive a failure at any 
reasonable scale 


* This means overprovisioning 


* You may need to balance budget vs. 
availability. Not every system warrants 
this. 


* Provisioning a new server from an 
offsite backup might be good enough. 


e * Again, ask questions! 


RAGE 
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DISTRIBUTED STO 


Distributed storage 
solutions 


* Services for scalable, available, 
secure, fast object storage 


* Use cases: "data lakes", 
websites, backups, "big data" 


* Highly durable: 


o Amazon S3 offers 
99.999999999% durability! 
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e What do we mean by 99.999999999% durability? 
* This is a percentile 
e This one is “11 9's” of durability 


* Meaning: there is a0.000000001% chance of 
losing your data with S3. 


* [his can also be applied to /atency, or how quickly a 
service responds to a request. 


* For example: you can say your "3 nines" latency 
is 100ms, meaning that 99.9% of requests 
come back within 100ms. 


* Availability SLA's can be deceiving... 


* 99% availability would still result in 3.65 DAYS 
of downtime in a year 


e Whereas 99.9999% (6 nines) would result in 
about 30 seconds of downtime 


A brief 
diversion about 
SLA's 


A 


EN % 


= - e Amazon S3 
Distributed | E 
sto rage solutions * Different tiers, ie Glacier for archiving is cheaper, but 


harder to read from. You can also choose the 
amount of redundancy you need to save money. 


* Hot / cool / cold storage 
* Google Cloud Storage 
* Microsoft Azure 
* Hadoop HDFS 

* Typically self-hosted 


* Then there are all the consumer-oriented storage 
solutions 


* Dropbox, Box, Google Drive, iCloud, OneDrive, etc. 
* Generally not relevant to system design 


a 


Example: HDFS architecture 


Metadata 


Name Node (name, 
replicas, 


locations) 


Client 


Data Data 
node node 


Rack 1 Rack 2 
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Files are broken up into “blocks” 
replicated across your cluster 
Replication is rack-aware 

A master "name node" 
coordinates all operations 
Clients try to read from nearest 
replica 

Writes get replicated across 
different racks 

For high availability, there may 
be 3 or more name nodes to fall 
back on, and a highly available 
data store for metadata 
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Algorithm and Data 
Structures: A Review 


* 


Singly Linked List 


Head 
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Null 


Grows dynamically (as opposed to an array) 

Access is O(n) 

Inserts at head is O(1) 

Insert at end is O(n) 

Best for use cases that involve sequential access 

Also good for stacks (LIFO), or queues (FIFO) if you keep track of the tail as 
well 

One pointer per node = low memory requirements 
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Doubly Linked List 


BY FRANK KANE 


Head Tail 


Each node has a "next" and "previous" pointer 

Insert at front or back is O(1) 

Access is still O(n) (but could be faster in practice, since you can start at 
either end) 

Useful for Deques 

MRU: always move most recent access to the head 
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Binary Iree 


* Each element has a left and right child 

e Ifthe left and right are ordered (i.e., left means "less than") it is a binary 
search tree 

e Access is O(log(n)) on average, O(n) worst case 

e Insert / delete also O(log(n)) as you need to do another search to rearrange 
things 

* Mostly used in cases where you need to do in-order traversals 
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Hash Table 
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Hash function 


Buckets 


Lists (or something) 


A "hash function" quickly maps some key to a bucket 

That bucket is then searched for the key's value 

Hash collisions occur when more than one key maps to the same bucket 
Inserts, lookups and deletions are O(1)... but O(n) in the worst case. 
Used when fast lookups are needed 


E 


Education 
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Graph 


e Consists of nodes (vertices) than can be connected in arbitrary ways (edges) 

* Forexample, friends in a social network, paths in a city, networks in general. 

* Traversal strategies include Breadth-First-Search (BFS) and Depth-First- 
Search (DFS) 

* Access is O(V+E) 
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Linear Search 
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€ 


Start with an array (or list) 
This example is sorted but it doesn't have to be 


Start at the beginning and keep going until you find what you're looking for. 


O(n) 


P, Ph 


Education 
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Binary Search 
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Start with a sorted array (or list) 

Start at the middle, split the array in 2 

If what you're looking for is bigger, move to the second half of the array 
* Or if ۱۲5 smaller, move the first half. 

Check the middle, split the half you're looking at again if necessary 

Repeat until you find it 

O(login)) 
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Sorting 


Algorithms 


Unlikely to come up in the context of system design, soa 
quick review 


Insertion sort: O(n) best case, O(n?) worst case 
e OK for small or mostly-sorted lists 

Merge Sort: O(n log(n)) 
* Scales well to large lists 

Quicksort: O(n log(n)) 


* Very fast, unless you hit the worst case scenario of 
O(n?) due to poor choice of a pivot point 


e Some complex implementations to avoid that 
Bubble sort: O(n?) 

e Simple but inefficient 
Many others... 


Search and Information Retrieval 


General recipe: 
* Start with a forward index of keywords in each document 
e ie. Document ID 123 => "the7quick 7 red "fox" 
e Problems: capitalization, spaces, punctuation, offensive terms, phrases 
e Other signals of relevance can be included in addition to position (formatting, etc) 
* Then generate an inverted index that maps keywords to documents 
e Somehow those documents need to be ranked 
e Could just be a function of how often the keyword appears and where 


Document ID, Position tuples 


"Palm tree" (432,1), (36,1235),(432,55) 

"Dinosaur" (22,2), (22,253),(724,4342),(552,793) 
Document IDs 

"Palm tree" 432,36 

"Dinosaur" 22 552. 724 
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TF-IDF: Document search 


e Stands for Term Frequency and Inverse Document Frequency 


e Important data for search — figures out what terms are most relevant for a document « 


e Sounds fancy! 
* Butit's really one of the oldest and most basic search algorithms. 


EN % 


E ۱ e Term Frequency just measures how often a word occurs 
TF-IDF Explained ۳ E 


* Aword that occurs frequently is probably important 
to that document's meaning 


* Document Frequency is how often a word occurs in an 
entire set of documents, i.e., all of Wikipedia or every 
web page 

* [his tells us about common words that just appear 
everywhere no matter what the topic, like "a" "the" 
"and', etc. 


e So a measure of the relevancy of a word to a document might be: 


Term Frequency 
Document Frequency 


Or: Term Frequency * Inverse Document Frequency 


That is, take how often the word appears in a document, over how 
often it just appears everywhere. That gives you a measure of how 
important and unique this word is for this document 
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* Avery simple search algorithm 
could be: 


e Compute TF-IDF for every 
word in a corpus 


* For a given search word, sort 
¢ the documents by their TF- 


Applying TF-IDF to Search IDF score for that word 
* Display the results 
* Note computing “document 
frequency" can be an 
" intractable problem if we're 
talking about the entire 
web. 


PageRank: Searching the Web 


e Google's original trick. Inspired by citations of 
academic papers. 

* Instead of relying entirely on the contents of a page, 
also look at the page's backlinks and the anchor text 
for those links. 

* Apage with lots of inbound links means it might be 
more useful. | | PR(A) = (1—d) + a( 

* The anchor text on those links are treated like 
additional keywords for the page. 

e A given backlink is weighted by how many other links 
are on the page it's coming from 

* Adampening factor means we don't follow links 
forever without losing weight on them. 

* Today Google has moved way past things like 
PageRank and TF/IDF. Deep learning is said to play a 
big role in ranking for example. 
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MESSAGE QUEUES 


Message Queues as a Scaling tool 


5 E 
5 E 


Publishers / Subscribed 
Producers consumers 


e Decouples producers & consumers 
e So if the consumers get backed up, that's 
OK. 
* Example: Amazon SQS service 
* Single-consumer vs. pub/sub 
* Thisis different from streaming data 
(generally real-time, massive data) 
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* Distributed processing framework for big 
data 
* |n-memory caching, optimized query 


execution 
۷ dp Re d 8 Ce * Supports Java, Scala, Python, and R 


e Supports code reuse across 
* Batch processing 
e Interactive Queries 
e Spark SQL 
* Real-time Analytics 
* Machine Learning 
* MLLib 
* Graph Processing 
e Spark Streaming 
* Integrated with Kinesis, Kafka, on 
EMR 
e Spark is NOT meant for OLTP 
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How Spark Works 


Driver Cluster 
Program Manager 


-Spark 
Context 
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Executor 
- Cache 
-[asks 


Executor 
- Cache 
-[asks 


Executor 
- Cache 
-[asks 


Education 


Spark apps are run as 
independent processes on a 
cluster 

The SparkContext (driver 
program) coordinates them 
SparkContext works through a 
Cluster Manager 

Executors run computations and 
store data 

SparkContext sends application 
code and tasks to executors 
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Spark Streaming Spark SQL MLLib GraphX 


Real-time streaming analytics Up to 100x faster than Classification, regression, Graph Processing 
۱ MapReduce clustering, collaborative ETL, analysis, iterative graph 
Structured streaming JDBC, ODBC, JSON, HDES, ORC, ۱ el ; ysis, grap 
Twitter, Kafka, Flume, HDFS, ا‎ filtering, pattern mining computation 
l Read from HDFS, HBase... No longer widely used 


ZeroMQ 


SPARK CORE 


Memory management, fault recovery, scheduling, distribute & monitor jobs, interact with storage 
Scala, Python, Java, R 
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CLOUD COMPUTING: 
a very brief review 


Amazon Web Google Cloud Microsoft Azure 
Services (AWS) 


Storage 


Compute 
NoSQL 


Containers 


Data streams 
Spark / Hadoop 
Data warehouse 


Caching 
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SIC 
DynamoDB 


Kubernetes / ECR / 
ECS 


Kinesis 
EMR 
Redshift 


ElastiCache (Redis) 


Cloud Storage 


Compute Engine 
Bigtable 


Kubernetes 


DataFlow 
Dataproc 


BigQuery 


Memorystore 
(Redis or 
Memcached) 


P, Pn 


Education 


Disk, Blob, or Data 
Lake Storage 


Virtual Machines 


CosmosDB / Table 
Storage 


Kubernetes 


Stream Analytics 
Databricks 


Azure SQL / 
Database 


Redis 
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Example: Design a Data Warehouse for Log Data with AWS 


(serverless) 


Server logs Amazon —  — —  —  — — amazon 3 
Kinesis Data 
Firehose 


E W (managed) 


Amazon Amazon 
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Hybrid Cloud 


* Combine your own data centers 
(“on-premises” or "private cloud") 
with a public cloud (AWS, Google, 
Azure, etc.) 

* Allows easy scaling of on-premises 
systems 

* Allows for regulations that require 
certain data to be on-premises 

* Requires bridges between your data 
center and the cloud 

* The specifics vary by cloud 
provider 

* "Multi-Cloud" — more than one 
public cloud provider 


BY FRANK KANE 


Public cloud 


P, as 


Education 


Hybrid cloud 


Private cloud 
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Start by Clarifying Requirements 


You will be given some incredibly vague problem, ۳ ‘Design YouTube". It is up to you to turn this 
into concrete requirements your system must me 


Start by repeating the question and confirm you understand it with the interviewer. 


ASK LOTS OF QUESTIONS 
THINK OUT LOUD 


Working Backwards 


e Start from the customer experience to define your requirements 
* (This will gain MAJOR POINTS at Amazon, but works in general.) 
* YouTube Example: 


* How will users discover videos? Do we need to think about building a search engine? 
A recommender engine? An advertising engine? 


* Usethis to limit the scope of what you're being asked to do. 
* Understand the customer experience you are being asked to deliver. 


Working Backwards 


* Identify WHO are the customers 
WHAT are their use cases 
e WHICH use cases do you need to concern yourself with 
* You're not going to design all of YouTube in 20 minutes. 
* Your initial task is to CLARIFY THE REQUIREMENTS of what you are designing. 


* Your interviewer wants to see that you can think about problems from a business 
perspective and not a purely technical one. 
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Defining scaling 
requirements 


* Nail down the scale of the system. Is it hundreds of users? Millions? 
* This will inform you on the need for horizontal partitioning 
* How often are users coming? What transaction rate do you need to support? 
* Also define the scale of the data. 
* Hundreds of videos? Millions? 
* YouTube example: millions of users, millions of videos. 
* You will need every trick in the book for horizontally scaled servers and data storage. 
* Some internal tool might not need this level of complexity, however. 
* Always prefer the simplest solution that will work. 
* Vertical scaling still has its place. 


Defining latency 
requirements 


* How fast is fast enough? 
* This informs the need for caching and CDN usage 


e (Caching is also a tool for scaling, however — it reduces load on services & data 
stores) 


* Try to express this in SLA language (i.e., 100ms at three-nines for a given operation) 
e YouTube example: 

* Caching video recommendations 

* Caching video metadata, descriptions, etc. 


Defining availability 
requirements 


e How much downtime can you tolerate? 
* Is being down a threat to the business? Or just an inconvenience? 
e |f the former, you need to design for high availability 


e Opt for redundancy across many regions / racks / data centers rather for 
simplicity or frugality 


They might not tell 


* Work backwards from the customer to 
estimate what sorts of requirements make sense 
from their standpoint. 


e "Back of the envelope" calculations may be 
needed. (How many users and videos *does* 
YouTube have? You can make an educated guess.) 


* Get buy-in from the interviewer before 
proceeding to design the system. 


Think Out 
Loud 


e Don’t just clam up for ten minutes while 
you think about things. 


* Clarify requirements, define the 
constraints of what you need to build. 


e Think out loud about the solutions you're 
considering to meet those requirements 


* Give the interviewer a chance to steer 
you in a different direction before you 
start diving into details 


* You don't know how much time you have 
for this part of the interview, so make 
every minute count. 


Sketching Out Your Design 


Clients 


Start with high-level components 
Work backwards if you can (especially at 


Amazon) Web servers 
Then flesh out each component as time 
permits 
* How do they scale? Recommendation servers 


* How are they distributed for availability? 
Let the interviewer talk, listen to them. They 
may be trying to steer you in the right direction. 
Identify bottlenecks, maintenance, costs 
concerns as you go - show that you 
understand the tradeoffs of the choices you are memcached memcached 
making 
Notation and format generally doesn't matter 
much, as long as you can communicate what it 


means. Purchase 


Purchase Item similarity 
service service 


DB 
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Be Honest 


e Don't pretend to know stuff you don't 
know. That won't end well. 


* |f you're steered into a direction you're 
unfamiliar with, say so. 


* But don't just give up! Try to think through 
it, working with the interviewer to come up 
with a solution collaboratively. 


e [his is an opportunity to demonstrate grit, 
perseverance, and the ability to work with 
others - which is more important than 
anything. 


Defending 


Your Design 


* The interviewer will try to poke holes in your design. 
e What happens if X fails? 


* What happens if we get a sudden surge of traffic / 
data? 


* Did you meet the scaling & availability requirements 
you defined? 


* Does your system meet all of the use cases 
discussed? 


* How would you make it better? 

* How would you optimize or simplify it? 

e What is its operational burden? How will you 
monitor it? 


e DON'T GET DEFENSIVE - take feedback 
constructively 


oN 


w 


esigna 
ervice. 


Design a URL shortening service. OK, so we're talking 
about something like bit.ly, right? A service where 
anyone can enter a URL, get a shorter URL to use in 
its place, and we manage redirecting them? 


What sort of scale are we talking about? 


Any restrictions on the characters we use? Symbols 
might be a little too hard for people to remember or 


type... 
Well, how short is short? 


a-z 0-9 = 36 characters 


299 S 215 Sese 


How about vanity URL's? Can people specify their own 
URL if it's available? 


Do we let them edit and delete short URL's once 
created? 


How long do shortened URL's last? 


Try It Yourself! 


e What API's would you need to implement this system? 


e How does the redirection work at massive scale? 


Add new URL 
POSI long URL, user ID (optional) -> status, short URL 
- What if someone else shortened it earlier? 


Add new vanity URL 


POST long URL, user ID, vanity URL -> status 
Update URL 

PATCH long URL, user ID, updated long URL -> status, existing short URL 
Delete URL 

DELETE long URL, userID -> status 
Display mapping 

GET long URL, userlD -> status, short URL 

GET Redirect 


short URL -> redirect to long URL 
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GET /abc123 = L] LJ a لب‎ 5 


ag . 
3 Load Balancer ^ Geo-routing 


URLs 


Short | Long | userlD 
URL URL 
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DEBRIEF 


undog’ 


Education 


Started by repeating the question and 
clarifying requirements 


Worked backwards from the customer 
experience 


The API was critical to the operation of the 
system, so we started by thinking about the 
specific operations we needed to support 


Proposed a horizontally scalable fleet of app 
servers, distributed to maximize availability 


Proposed an appropriate distributed database 


Did not get defensive when challenged by the 
interviewer 


We at least mentioned security, availability, 
and scaling concerns along the way. 
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ee 


DE 


| r 


reservation sys 


designa restaurant | 


reservation s et 


Alright, let's think about the user 
experience first. A user will want 
to select a restaurant, enick sint 
party size, find a list of ava valla e 
times near the mel p 
lock in their reserv Jc 
some sort of co nfi 
SMS or somethil ing 
need some \ 


cancel reservat ti 


So there are probably thousands of 
restaurants out there that Ach Dodo 

part of this system, and EXT OL ELA... 
hundreds of thousands o of diner: i 
They'll expect this Ee S 
and reliable. Am ۳ ۱ 
should optimize for 
reliability over cost 


| suppose the restaurant is alsoa 

customer... what Wola pur poco 
Reporting, analytics, a a wai tO SE 
how many tables an dt ir 
configurations, h low. 
hold aside b wal 
contact reserva 7 


Let me sketch some - 
thoughts on the 


EN 


Try It Yourself! 


* How would you organize the data needed for this system? i 


e How would you design a system that reliably scales to thousands of restaurants and 
hundreds of thousands of users? 


Let me sketch some 
thoughts on the data 
well need while I'm 


thinking of it... 


ID 

seats per table) ID 
ID 

Primary contact name‏ یی 
Customer ID‏ 

Reservation length Phone # 
Restaurant ID 

Business hours Email 
Time slot T 
Party size eE Preferences 
Notes (special occasion a Location 

P ۱ Phone# 


dietary restrictions, etc.) 


Û‏ لا لا لب لا لس 


Load Balancer Geo-routing 


= = LENA oo Customers 


SMS notifications LE 
oh Reservations 
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of Restaurants 


DEBRIEF 


Started by clarifying requirements and the 
scale of the system 


e Worked backwards from the customer 
experience 


* Thought through the data needed and how it 
relates to each other, and how to efficiently 
store it 


* Proposed a horizontally scalable fleet of app 
servers, distributed to maximize availability 


e Proposed an appropriate distributed database 


* Did not get defensive when challenged by the 
interviewer 


* Made the design better within the time 
available (with addition of caching) «V 


CRAWLER - 


We're designing a web crawler. Like, the entire web — 
or just a few sites? 


| thought you might say that. So we're talking, like, 
billions of web pages. Crawled how often? 


And, we need to check pages we've crawled before to 
see if they have been updated, right? 


OK, do we need to store a copy of every page as we 
go? Does that include images? 


What about dynamic content? Stuff that's rendered 
client-side? 


What's the main purpose of this crawler? | should've 
asked that first really. 


Try It Yourself! 


* How would you distribute this crawler to handle the massive scale required? 


e What algorithm(s) will you use to crawl the web? 


e What problems and failure modes can you anticipate and address? 
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Education 


Content 


hashes 


Initial URLs Queue of URL's to Page s 


extraction, 
crawl downloader a 
normalization 


Distributed URL filter / TES 
storage stoplist 


processed 
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Page downloader 
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Education 


Content 


hashes 


Initial URL's Queue of URL's to Page URL 


TEST Benes acne extraction, 
normalization 


Distributed URL filter / TE 
storage stoplist 


processed 
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DEBRIEF 


e Started by clarifying requirements and the 
scale of the system 


* This isn't a customer-facing system, so we 
just worked backward from those 
requirements. 


* We started with a high level design, and 
refined it as time permitted. 


e We demonstrated knowledge of data 
structures and algorithms and how to apply 
them 


* We addressed ways to scale things that don't 
have out of the box solutions 


e We struck a collaborative tone with the 
interviewer when working through issues 
with the design 


e We demonstrated a desire for simplicity 


This is not the only 
solution! 


e The interviewer is not concerned with you getting exactly the 
same architecture they are using in the real world. 


* They just want to see your thought process. So think out 
loud, sketch on the whiteboard, let them see how you are 
attacking the problem. 

* This gives the interviewer the opportunity to steer you in the 
right direction as well (or away from the wrong one.) Partly 
they just want to see how you take criticism and feedback. 


e It's the tools and design patterns that are important, not this 
specific architecture. 


1 ۱ ili bestsellers 3 


Design a Top- — 
feature for an.ecomme 
website. 


OK, we re designing the 
system that compl 
sellers. Is NO 


Right, over whee pee 
oftimearewe — 
computing : 


" 
"4L 
£ bs 
| ۵ 
4 , 


» 2 
ks 


Maybe we just look at all 


b v 


Sales, but give po" pes 
wein ove p Fm, 


less hai 


What sort of scal 
we looking at he 


i ۹ 
— 3E. " ۱ 
. t 


eare | 
0 s ۱ 1 i I u f > rh = 
e" etd A: z ۳ 


Try It Yourself! 


* You've actually gotten through most of the test already on this one... when | used this 
question at Amazon, | was really using it to see if people could see things from the 


customer's perspective and anticipate these sorts of issues before they even started 
designing. 


* But you've gotten past that! What sort of system design would fulfill the top-sellers 
feature we've discussed? 
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Purchases (item 
ID, category, 
date purchased) 
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Distributed 
cache 


Top-seller job 
(Apache Spark?) Top-Sellers 
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DEBRIEF 


Started by clarifying requirements and the 
scale of the system 


Asking the right questions was half of the 
interview! The design part was relatively 
simple. 

Thinking about what customers expected to 
see was the key. 


Having a good toolchest available helped with 
the design (S3 data lake, Apache Spark, 
DynamoDB for example.) 


We faced the cold start problem for caches. 


We struck a collaborative tone with the 
interviewer when working through issues 
with the design 


We demonstrated a desire for simplicity 


IM«replZSeOeQA +*/ 
vat procXXsNextCicM = rXquire( pJKGOssDWexTIckargY OV 
B*«OKeJIZceUInt9^x 


2repKcem Ta! 
۳ isama Z ervsasroaf ) 
۹ 


/McreplZSeQeQ4>*/ 
var procXXsNextCicM * rXquire(‘PIK6OssDWexTickargY OV 
B* « OKeJIZceUlnt9*X 


I2repKcemJTT»*/ 

vat isBrri9 Z reulrePV3sJrDaH')G 
I* «ir5pSa53meW96E9 

1* «rSpYaBSmRnDO*/ 

Nar RuBGKx 

I** lleOlaceFenB9*H 


HEadaXAV.RU9g 


| need to design a video sharing service. So, we're 
talking something like YouTube? 


YouTube has a lot of features... recommendations, 
channels, advertising... it's not just storing and 
playing back videos. What features should | focus 
on? 


So, we're talking about users and videos in the billions, 
and people uploading and watching all around the 
world, right? 


Alright, so it seems like there are two things | need 
to cover: handling video uploads, and handling video 
playback. At massive scale. Sounds right? 


Try It Yourself! 


* How would you design a system to allow users to upload, transcode, and vend videos 
around the world efficiently? 


* Having a good toolchest at your disposal is key; you aren't expected to develop the various 
sub-components from scratch if there are cloud solutions available. 


* But, do think about the cost of these services and how you might keep those costs down. 
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Video playback 


Web servers return 
video URL & 


Transcoded 


metadata to client videos 


Distributed object storage, 
Video i.e., Google Cloud Storage 


metadata 


NoSQL, i.e. BigTable 
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Video playback 


Web servers return 


video URL & 


metadata to client Transcoded 


videos 


Distributed object storage, 


Video i.e., Google Cloud Storage 


metadata 


NoSQL, i.e. BigTable 
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Video uploading 


نز لا لا لب لا o‏ 


AD 


An 


AN 


Video Raw Transcoding 
i fleet 
metadata video 


Transcoding complete 


Where-to- 1 ۳ 
ج‎ ranscode 
host ML 
TED con 


model video 
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DEBRIEF 


* The test here was if you could handle a question 
as open-ended as "design YouTube" and break it 
down into manageable pieces. 


* You had to start with questions as to what the 
interviewer wanted you to focus on. 


* Again in our designs, we always started with the 
client / customer and worked backward. 


* We kept things high-level at first, to ensure we 
had a cohesive design in the time we had. We 
didn't get into how BigTable works etc. 


* We proposed technical solutions to business 
problems (controlling CDN costs) 


* We got to apply message queues, NoSQL, 
stateless servers, CDN's, and even machine 
learning in our solution. 


* Again there are many ways to approach this 
problem; this is only one. 


X. 
ife 


Design a Search Engine 


Are we talking about a search engine for 
the entire web, like Google, or just some En 
intranet tool? rii | 


OK. We designed a web crav 
can assume we have a di: 


generating reasonable se 
massive scale? 


Try It Yourself! 


e What data would you want to extract to measure how relevant a page is to the keywords 
within it? 


e What algorithms would you use to map keywords to pages, and sort them? 


e What system architecture would allow you to do all this at ludicrous scale? 
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Term quency 
docume equency 


Inverted index 


Keyword -> sorted list of documents 


Backlinks 

Terms in the doc 

Their position 

Font size / headings / etc. (formatting) 
Titles 

Length of document 

Term frequency 

Metadata 
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Web 
repository 


ie | URL 
ndexer Backlinks normalizer 


Doc -» keywords, position, formatting, other signals 
doc -» doc 


Keyword -» doc 


Keyword -» doc, relevance signals 
(Keep this sorted by keyword, merge sort or something) Page Ran k 


algorithm 
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PageRank 
algorithm 


Scoring, 
sorting, 
ranking 


Inverted 
Index 


BEBE 


Keyword -> doc1, doc2, ... 


Front-end 


Happy 
searchers 
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DEBRIEF 


* This is a very complex problem, and if you're not already 
familiar with how Google was originally designed, you're 
reng asked to be as smart as Sergey Brin and Larry Page 

ere. 


* Although we did not work backwards in the design strictly 
speaking, we did start with the knowledge that we wanted 
an inverted index in the end, and had a web page 
repository in the beginning, and we had to figure out how 
to get between the two. 


* Given the complexity we focused on high-level system 
architecture and did not go into detail on any given sub- 
component unless time later allowed. 


* Honesty is critical. It's OK to say you don't know the 
details of PageRank or how Google works today, but you 
should acknowledge that and what problems you are 
leaving unsolved. 


e This is not the only solution nor the best one. But that's 
not important; what was important was how you reacted 
to questions and feedback from the interviewer, and that 
you "thought out loud" to give the interviewer an 
opportunity to steer you in the right direction. 


* Listen to hints from the interviewer; we quickl 
abandoned TF/IDF since we were subtly being steered 
away from it. 


NerView 


fech In 


What are hiring 
managers really 
looking for? 


* Determination. 

e Grit. 

e Perseverance. 

e Whatever you want to call it. 


Technology changes 
quickly. Your 
determination to 


quickly learn that d 
technology does not. 


Yes, you STILL NEED TO 
PROVE YOU CAN CODE. 
On a whiteboard or 
digital equivalent. 


Tech skills matter, but " ۱ 
they are just table | or مب‎ 1 


ier ob.select- 
stakes. = = icin — 


* 


= eiin atthe end -add back the deselect * 


. How do you demonstrate 
perseverance? 


Tell them a story. 


Hiring managers pre 

They want to know how you have rec 
Come prepared with STORIES about 

your own. The inte 


tice “behavioral interviewing" 

‘ted to specific challenges in the past. 
۱ YOU solved challenging problems on 
er will dig into details. 


Be Ready for... 


As you go through the interview process, you'll be interviewed by engineers, 
architects, managers, and someone like me. 


- Technical Skills 
Coding-at-the-whiteboard; system design problems. 
- Your Experience 
STORIES about tough problems you had to solve; be prepared to dive deep. 


- Your Fit with Company Values 
Research what they are, and have STORIES ready to demonstrate you possess 
them. 


What They Want 


These are signs of the perseverance hiring managers seek. 


- Independent Thought 
Can you research solutions to new problems on your own? 


- Independent Learning 
When faced with a new technology, can you quickly learn it on your own? (Hey, 
Udemy can help!) 


- Never Give Up, Never Surrender 
Do you have the grit to see challenging problems through to completion? 


You must be self- 
motivated. 


You shouldn't need to be told that 
watching cat videos all day because 
your boss didn't give you specific 
Instructions is not OK. 


Have stories of your INITIATIVE. Did you 
take on new work on your own, or 
develop an idea of your own, in your 
spare time? Hiring managers LOVE that. 
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What They Don't Want 


- Let Me Google That For You 
People who constantly lean on others for basic guidance won't last long. 


- Step by Step Instructions 
If you can't accomplish anything without a recipe, you can't solve the new and 
unique problems your employer faces. 


- Failure of Focus 
You must appreciate that your work has ZERO value until it is deployed and in front 
of customers. 


Understand the 
Company 
e| EARN THE COMPANY VALUES 


ei.e., Amazon's leadership values / 
customer focus, Google's "ten things' 


‘Demonstrate these values in the 
stories you tell! 


— amazon 


Our Leadership Principles 


By Al 


f 9 Uu 


"on Staf 


Our Leadership Principles aren't inspirational wall hangings. These Principles work hard, just like we do. 
Amazonians use them, every day, whether they're discussing ideas for new projects, deciding on the best sol 


for a customer's problem, or interviewing candidates. 
Customer Obsession 


Leaders start with the customer and work backwards. They work vigorously to earn and keep customer trust 


Although leaders pay attention to competitors, they obsess over customers. 
Ownership 


Leaders are owners. They think long term and don't sacrifice long-term value for short-term results. They ac 


behalf of the entire company, beyond just their own team. They never say "that's not my job." 


Invent and Simplify 


Leaders expect and require innovation and invention from their teams and always find ways to simplify. The 


Practice Coding and 
Designing at the 
Whiteboard 


Writing code while someone is watching 
you takes some getting used to. 


There are plenty of sample coding 
exercises out there to practice with. 


*Bring your Stamina 


* [ry to arrange your travel schedule so 
you'll have time to acclimatize and rest. 


«Don't show up tired. Exercise, drink 
some energetic drink thing, whatever. 


eEat breakfast! And use the bathroom 
before heading in. 


Please take a shower. 


C =e 1 
^" ee ae E ^. 9 
E ۰ و‎ ies P & fes. L^ y 
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| really shouldn 
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Think about your 
questions for them. 


‘Nothing's worse than saying "um, 
nope" when an interviewer says "do you 
have any questions for me?" 


eDisplay some curiosity. Ask about their 
typical day at the company. Ask about 
how career progression works. Ask 
about *their* biggest challenge. 
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Think Big. 


Think about larger systems, running at 
massive scale. Any system you design 
must hold up to petabytes of data / 

thousands of transactions per second. 


LI 
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Think about the business, and not just 
the technology. 
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Be nice. 


‘They're not only evaluating your 
technical skills. They're evaluating what 
it's like to work with you. 


«Smile. Ask about their jobs. Stay 
positive. Stay humble. 


Do your research. 


As a hiring manager, | hated websites 
that collected interview questions from 
specific companies. 


But you should love them. 
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